release: atdata v0.3.0b1#44
Merged
maxine-at-forecast merged 42 commits intofoundation-ac:mainfrom Jan 31, 2026
Merged
Conversation
…hods to type checkers The decorator now returns type[PackableSample] instead of type[_T]. Combined with @dataclass_transform(), this allows IDEs to recognize both: - Original class fields (via dataclass_transform) - PackableSample methods: packed, as_wds, from_bytes, from_data Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add test and lint recipes to justfile, update CLAUDE.md to document all available just commands, and regenerate docs with updated quarto theme/styling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…reSQL, Redis) Refactor Index class to delegate persistence to an IndexProvider protocol. Extract existing Redis logic into RedisProvider and add SqliteProvider and PostgresProvider backends. LocalIndex is now a factory function that selects the backend by name. Adds optional `psycopg[binary]` dependency, provider tests, and updated changelog/docs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Consolidate duplicated _parse_semver into _type_utils.py, replace bare except clauses and assert statements with specific exceptions, tighten generic pytest.raises(Exception) to exact types, and convert TODO comments to explanatory notes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… singleton - Introduce Repository dataclass and _AtmosphereBackend in new repository.py module - Extend Index with repos param for named repositories and atmosphere param for ATProto backend with lazy anonymous client default - Add _resolve_prefix routing for @handle/dataset, atdata:// URIs, and repo/name prefixed references - Add get_default_index/set_default_index singleton so load_dataset no longer requires an explicit index argument - Deprecate AtmosphereIndex in favour of Index(atmosphere=client) - Update exports, tests, and CHANGELOG Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…endency setup Index() now uses SqliteProvider when no provider, redis connection, or Redis kwargs are given. Explicit redis= or **kwargs still select Redis for backwards compatibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove backwards-compat dict-access methods from SchemaField and LocalSchemaRecord (unused since dataclass migration) - Consolidate add_entry to delegate to _insert_dataset_to_provider, eliminating duplicate entry-creation logic - Trim over-verbose module and class docstrings in local.py - Narrow pytest.raises to exact IndexError for batch_size=0 test - Add test coverage for prefix routing edge cases and error paths (atmosphere disabled, unknown repo, @handle without slash) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…chy (GH#38) - Add _exceptions.py with AtdataError, LensNotFoundError, SchemaError, SampleKeyError, ShardError - Add Dataset convenience API: __iter__, __len__, head, get, describe, schema, column_names, filter, map, select, to_pandas, to_dict - Wire filter/map into ordered() and shuffled() via _post_wrap_stages - LensNetwork.get_lens now raises LensNotFoundError with available targets - Export new exceptions from atdata.__init__ Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add atdata inspect: dataset summary with sample count, schema, shards - Add atdata schema show/diff: display and compare dataset schemas - Add atdata preview: print first N samples from a dataset - Make LensNotFoundError inherit ValueError for backwards compatibility - Update lens error message and corresponding test assertions - Add test_dev_experience.py for new Dataset convenience methods Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…n (GH#38) Replace argparse-based CLI with typer for declarative command definitions, automatic help generation, and better subcommand support. Fix bug where get_schema() passed a LocalSchemaRecord to ensure_stub() instead of a plain dict, causing silent stub generation failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…used modules Decompose the 1955-line local.py into a local/ package with dedicated modules for entry types (_entry.py), schema models (_schema.py), S3 storage (_s3.py), the Index class (_index.py), and the deprecated Repo class (_repo_legacy.py). The __init__.py facade re-exports all public names to preserve backward compatibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ction into Index Absorb LocalIndex's string-based provider selection (sqlite/redis/postgres) directly into Index.__init__ via new `provider: str`, `path`, and `dsn` parameters. Remove the LocalIndex factory function and update all references across source, docstrings, and tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace TypeVar bounds and type annotations that reference the concrete PackableSample class with the Packable protocol across dataset, schema codec, atmosphere, local index, HF API, and tests. This decouples generic type parameters from the implementation class. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduce ManifestField annotation, ManifestBuilder, ShardManifest data model, ManifestWriter (JSON + parquet), QueryExecutor, and SampleLocation. Add Dataset.query() for two-phase manifest-based sample lookup. Integrate manifest generation into S3DataStore.write_shards() with optional manifest=True flag. Export all public types from atdata.__init__. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add benchmarks/ package covering dataset I/O, index providers, query execution, and atmosphere operations. Include shared fixtures in conftest.py, pytest-benchmark dev dependency, and justfile commands (bench, bench-save, bench-compare) for running and comparing results. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…y, and stub manager Add dedicated test modules for previously low-coverage areas: test_cli.py, test_postgres_provider.py, test_query_coverage.py, test_repository_coverage.py, and test_stub_manager.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…I job Introduce per-category pytest markers (bench_serial, bench_index, bench_io, bench_query, bench_s3) with separate JSON exports. Add render_report.py for HTML report generation with median/IQR stats. Update justfile with per-category bench commands and report step. Add benchmark CI job to uv-test.yml. Use realistic data shapes (ImageNet uint8, timeseries float32) in conftest constants. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add TestQueryIterationBenchmarks with equality, range, and large result set iteration benchmarks measuring end-to-end query-and-iterate cost. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…mples Drop unused imports (PackableSample, Optional, dataclasses, tqdm) from _hf_api, cli, and dataset modules. Shorten verbose docstrings in DictSample and Dataset to one-liners. Consolidate duplicate sample types across test files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… dtype matching, strengthen assertions Replace np.save/np.load with compact struct-based binary format in _helpers.py (backward-compatible with legacy .npy deserialization). Strip redundant Protocol method docstrings in _protocols.py. Fix numpy_dtype_to_string to prefer exact match and sort substring keys longest-first to avoid "int8"/"uint8" ambiguity. Strengthen weak `assert X is not None` test assertions to verify actual values. Add test_type_utils.py for _type_utils coverage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove release/* from push triggers and branch filter from pull_request to avoid double runs when PRs target main. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Release v0.3.0b1 — a major feature release with infrastructure improvements across the board:
ManifestFieldannotations,ManifestBuilder,ShardManifest,ManifestWriter(JSON + parquet),QueryExecutor,SampleLocation, andDataset.query()for two-phase shard-level filteringatdata inspect,schema show/diff, andpreviewcommandsatdata.configure_loggingfor structured logging,PartialFailureError+Dataset.process_shards()for shard-level error handling with retry,atdata.testingmodule with mock clients and fixtureslocal.pymonolith intolocal/package (_entry,_schema,_s3,_index,_repo_legacy); removeLocalIndexfactory in favor ofIndex(provider="sqlite"); consolidate string-based provider selection intoIndex.__init__PackableSampletoPackableprotocolnp.save/np.load), fixnumpy_dtype_to_stringlongest-match orderingpytest-benchmarkintegration with per-category markers, HTML reports viarender_report.py, CI benchmark jobBreaking changes
LocalIndex()factory removed — useIndex(provider="sqlite")orIndex(redis=conn)directlylocal.pyis nowlocal/package (import paths unchanged via__init__.pyfacade)Test plan
uv run pytest— 823 passed, 33 skipped, 0 failedjust lint— no ruff errorsjust bench)🤖 Generated with Claude Code